Based on billions of words on the internet, people = men

Recent advances have made it possible to precisely measure the extent to which any two words are used in similar contexts. In turn, this measure of similarity in linguistic context also captures the extent to which the concepts being denoted are similar. When extracted from massive corpora of text written by millions of individuals, this measure of linguistic similarity can provide insight into the collective concepts of a linguistic community, concepts that both reflect and reinforce widespread ways of thinking. Using this approach, we investigated the collective concept person/people, which forms the basis for nearly all societal decision- and policy-making. In three studies and three preregistered replications with similarity metrics extracted from a corpus of over 630 billion English words, we found that the collective concept person/people is not gender-neutral but rather prioritizes men over women—a fundamental bias in our species’ collective view of itself.

.00 Note.The words that do not have ratings were added after the rating study was conducted because of suggestions from the coders as described in the Materials and Methods sections of the main text.The words that do not have ratings were added after the rating study was conducted, either because of suggestions from the coders or to parallel an other-gender counterpart already on the list as described in the Materials and Methods sections of the main text.

Study 2B Additional Methodological Details of the Findings Reported in the Main Text in Study 2B
Our final word list consisted of 178 trait words with gender stereotypicality designations (Table S4).The gender words were the same as in Study 1 (Table S2).

Additional Analytic Details of the Findings Reported in the Main Text in Study 2B
As reported in the main text with respect to our first prediction, we found that, overall, trait words were more similar in their usage to words for MEN (M = 0.15, SD = 0.05) than to words for WOMEN (M = 0.14, SD = 0.05), B = 0.01, SE < 0.01, p < .001,d = 0.19.This result was the output of a mixed-effects linear regression with gender (words for MEN, words for WOMEN; categorical variable) predicting cosine similarity to traits, with a random intercept for each trait word.
As reported in the main text with respect to our second prediction, we found that the cosine similarity of the 178 trait words with words for MEN and, separately, words for WOMEN depended on gender stereotypicality of the traits (i.e., there was an interaction), B = 0.02, SE < 0.01, p < .001.Specifically, the was no statistically significant difference between words for MEN and traits stereotypical of men (M = 0.15, SD = 0.04) compared to traits stereotypical of women (M = 0.14, SD = 0.05), B < 0.01, SE = 0.01, p = .807,d = 0.04.In contrast, words for WOMEN were more similar to traits stereotypical of women (M = 0.14, SD = 0.05) than to traits stereotypical of men (M = 0.13, SD = 0.05), B = -0.01,SE = 0.01, p = .049,d = -0.30.This result was the output of a mixed-effects linear regression with gender word (words for MEN, words for WOMEN; categorical variable), trait stereotypicality (stereotypical of men, stereotypical of women; categorical variable), and their interaction predicting cosine similarity to traits, with a random intercept for each trait.We followed up on the significant interaction within the same model using simple slopes analyses.
resourceful M c a Designated as stereotypical of men.b Designated as stereotypical of women.Gender stereotypicality designation was taken from reference c 49, d 40, e 50, f 47, g 48, but note that many traits were repeated across multiple sources.

Study 3 Additional Methodological Details of the Findings Reported in the Main Text in Study 3
The final word list consisted of 252 cases of verbs with male-biased vs. female-biased designations.Note that these 252 cases of verbs corresponded to 211 unique verbs; there were some repetitions based on differing valence or subject vs. object position of the gender bias as explained in the Materials and Methods section of the main text (Table S5).The gender words were the same as in Study 1 (Table S2).

Additional Analytic Details of the Findings Reported in the Main Text in Study 3
Regarding our first prediction, as reported in the main text, we found that verbs were overall more similar in their usage to words for MEN (M = 0.11, SD = 0.04) than to words for WOMEN (M = 0.10, SD = 0.04), B = 0.01, SE < 0.01, p < .001,d = 0.26.This result was the output of a mixed-effects linear regression with gender words (words for MEN, words for WOMEN; categorical variable) predicting cosine similarity to verbs, with a random intercept for each verb.
Regarding our second prediction, as reported in the main text, we also found that the cosine similarity of the 252 verbs with words for MEN and, separately, words for WOMEN depended on gender bias of the verbs (i.e., there was an interaction), B = 0.01, SE < 0.01, p < .001.Specifically, there was no statistically significant difference between words for MEN and verbs that were male-biased (M = 0.11, SD = 0.04) compared to verbs that were female-biased (M = 0.11, SD = 0.04), B = -0.01,SE = 0.01, p = .128,d = -0.20.In contrast, words for WOMEN were more similar to female-biased verbs (M = 0.11, SD = 0.05) than to male-biased verbs (M = 0.09, SD = 0.03), B = -0.02,SE = 0.01, p < .001,d = -0.54.This result was the output of a mixed-effects linear regression with gender word (words for MEN, words for WOMEN; categorical variable), verb syntactic bias (male-biased, female-biased; categorical variable), and their interaction predicting cosine similarity to verbs, with a random intercept for each verb.We followed up on the significant interaction within the same model using simple slopes analysis.

Exploratory Analyses in Study 3
The list of 252 verbs was taken from prior work that, in addition to identifying the syntactic gender bias of each verb, indicated the valence (i.e., sentiment) of the verb as positive, negative, or neutral and indicated whether the verb's gender bias occurred with arguments in the subject or object position (51).In two sets of exploratory analyses, we tested whether the findings in the present study were further moderated by valence or by the syntactic position in which the gender bias occurred.
Valence of the Verb.To test the potential moderating effect of valence, we first compared a mixed-effects linear regression with gender word (words for MEN, words for WOMEN; categorical variable), valence of the verb (negative, positive, or neutral; categorical variable), and their interaction terms predicting cosine similarity to verbs, with a random intercept for each verb, to an identical model that omitted the interaction terms.There was no evidence that the model with interaction terms explained significantly more variance than the model without the interaction terms, χ 2 (2) < 0.01, p > .999,indicating that valence did not moderate the difference between words for MEN and words for WOMEN.
Second, we compared a mixed-effects linear regression with gender word (words for MEN, words for WOMEN; categorical variable), verb syntactic gender bias (male-biased, female-biased; categorical variable), verb valence (negative, positive, or neutral; categorical variable), and their interaction terms predicting cosine similarity to verbs, with a random intercept for each verb, to an identical model but without the higher-order valence interaction terms.There was no evidence that the model with the valence interaction terms explained more variance, χ 2 (6) < 0.01, p > .999.Thus, in both of these analyses, there was no evidence that the valence of the verb moderated either the overall difference between words for MEN and words for WOMEN or the interaction effect between the gender words and the verb syntactic gender bias.

Syntactic Position of the Verb's Gender
Bias.To test the potential moderating effect of the syntactic position in which the gender bias occurred for the verbs, we first conducted a mixed-effects linear regression with gender word (words for MEN, words for WOMEN; categorical variable), verb syntactic position of the bias (subject, object; categorical variable), and their interaction term predicting cosine similarity to verbs, with a random intercept for each verb.The interaction between gender and syntactic position was not significant, B < 0.01, SE < 0.01, p = .142,indicating that synaptic position did not moderate the difference between words for MEN and words for WOMEN.
Second, we conducted a mixed-effects linear regression with gender word (words for MEN, words for WOMEN; categorical variable), verb syntactic gender bias (male-biased, female-biased; categorical variable), syntactic position of the bias (subject, object; categorical variable), and their interaction terms.The interaction between gender word, verb syntactic gender bias, and syntactic position was not significant, B < 0.01, SE = 0.01, p = .722.Thus, in both of these analyses, there was no evidence that the syntactic position of the gender bias for each verb moderated either the overall difference between words for MEN and words for WOMEN or the interaction effect between the gender words and the verb syntactic gender bias.

Overview of Replication Studies
We conducted direct, preregistered replications of Studies 1-3.Each replication used identical lists of words and other procedures to Studies 1-3, respectively, with one exception: We used a different set of word embeddings.The goal of these replications was to test whether the present findings are robust to incidental details in the algorithms used to create the word embeddings.
In Studies 1-3 (main text), we used 300-dimensional fastText embeddings extracted from the Common Crawl corpus.For the present replication studies, we used 300-dimensional Global Vectors for Word Representation (GloVe) embeddings (7), also trained on the Common Crawl corpus.

Replication of Study 1
We compared words for PEOPLE to words for MEN and to words for WOMEN using the same mixedeffects linear regression described in Study 1.With this different set of word embeddings, we replicated Study 1 and found that words for PEOPLE were more similar in their use to words for MEN (M = 0.19, SD = 0.06) than to words for WOMEN (M = 0.15, SD = 0.04), B = 0.04, SE < 0.01, p < .001,d = 0.67.

Replication of Study 2A
To test our first prediction that, overall, trait words would be more similar in their usage to words for MEN than to words for WOMEN, we used the same multilevel model described in Study 2A.We replicated Study 2A and found that trait words were more similar in their usage to words for MEN (M = 0.14, SD = 0.06) than to words for WOMEN (M = 0.13, SD = 0.06), B = 0.02, SE < 0.01, p < .001,d = 0.26.
To test our second prediction that there would be an asymmetry in gender-stereotypical associations reflected in collective concepts, we conducted the same mixed-effects linear regression described in Study 2A.We again replicated Study 2A and found that the similarity between the words for MEN and WOMEN and the trait words depended on the gender stereotypicality of the traits (i.e., there was an interaction), B = 0.03, SE < 0.01, p < .001.Specifically, words for MEN did not differ significantly in their similarity to traits stereotypical of men (M = 0.16, SD = 0.06) and to traits stereotypical of women (M = 0.16, SD = 0.06), B < 0.01, SE = 0.01, p = .650,d = 0.07.In contrast, words for WOMEN were more similar to traits stereotypical of women (M = 0.15, SD = 0.06) than to traits stereotypical of men (M = 0.13, SD = 0.05), B = -0.02,SE < 0.01, p = .033,d = -0.35.

Replication of Study 2B
We note one departure from the preregistration of this replication study.The preregistration indicates that we will test 180 traits; however, in the present replication study (as in Study 2B), we analyzed 178 traits because we removed the traits "feminine" and "masculine," which appeared in our list of gender words (Table S2).This was the only departure from the preregistration.
To test our first prediction that, overall, trait words would be more similar in their usage to words for MEN than to words for WOMEN, we used the same mixed-effects linear regression described in Study 2B.We replicated Study 2B and found that trait words were more similar to words for MEN (M = 0.16, SD = 0.06) than to words for WOMEN (M = 0.15, SD = 0.06), B = 0.02, SE < 0.01, p < .001,d = 0.28.
To test our second prediction that there would be an asymmetry in gender-stereotypical associations reflected in collective concepts, we conducted the same mixed-effects linear regression described in Study 2B.We again replicated Study 2B and found that the similarity between the words for MEN and words for WOMEN and the trait words depended on gender stereotypicality of the traits (i.e., there was an interaction), B = 0.02, SE < 0.01, p < .001.Specifically, words for MEN did not differ significantly in their similarity traits stereotypical of men (M = 0.16, SD = 0.06) and to traits stereotypical of women (M = 0.17, SD = 0.06), B = -0.01,SE = 0.01, p = .237,d = -0.17.In contrast, words for WOMEN were more similar to traits stereotypical of women (M = 0.16, SD = 0.06) than to traits stereotypical of men (M = 0.13, SD = 0.05), B = -0.03,SE = 0.01, p < .001,d = -0.55.

Replication of Study 3
To test our first prediction that, overall, verbs would be more similar in their usage to words for MEN than to words for WOMEN, we used the same mixed-effects linear regression described in Study 3. We replicated Study 3 and found that verbs were more similar to words for MEN (M = 0.16, SD = 0.06) than to words for WOMEN (M = 0.14, SD = 0.06), B = 0.02, SE < 0.01, p < .001,d = 0.40.
To test our second prediction that there would be an asymmetry in gender-stereotypical associations reflected in collective concepts, we conducted the same mixed-effects linear regression described in Study 3. We again replicated Study 3 and found that the similarity between the words for MEN and WOMEN and the verbs depended on the gender bias of the verbs (i.e., there was an interaction), B = 0.02, SE < 0.01, p < .001.As in Study 3, words for WOMEN were more similar to female-biased verbs (M = 0.15, SD = 0.06) than to male-biased verbs (M = 0.12, SD = 0.05), B = -0.04,SE = 0.01, p < .001,d = -0.66.Unlike Study 3, we also found that words for MEN were more similar to female-biased verbs (M = 0.17, SD = 0.06) than to male-biased verbs (M = 0.15, SD = 0.05), B = -0.02,SE = 0.01, p = .008,d = -0.35,but note that this effect for words for MEN was significantly weaker than the same effect for words for WOMEN given the significant interaction.This last finding about words for MEN is a minor departure from Study 3, but the overall pattern of results is consistent between the two studies because there was again evidence for an asymmetry in gender-stereotypical associations and specifically for stronger genderstereotypical associations about WOMEN than about MEN in collective concepts.

Overview of Control Analyses and Robustness Checks
The results of Studies 1-3 were robust to a variety of control analyses and robustness checks.These included the following, each of which is described in greater detail below and was preregistered for the replication studies: (a) in Study 1, adding weights to the analysis such that the words for PEOPLE that were rated as more representative of the concept by coders were weighted more heavily; (b) in Studies 1-3, removing masculine generic words and their counterparts and recomputing the analyses; (c) in Studies 1-3, conducting "leave one out" analyses for the key result; (d) in Studies 1-3, conducting a permutation test of the key result; (e) relevant to Studies 1-3, testing for potential differences in word frequencies of the gender words; and (f) in Studies 2A, 2B, and 3, conducting word-embedding association tests (WEAT).Finally, we also (g) tested the generalizability of the critical findings in Study 1 to a more specialized domain.We replicated the results of Study 1 in the biomedical domain using word embeddings trained on biomedical and clinical text instead of undifferentiated text on the internet (i.e., the Common Crawl, which was the basis of the studies reported in the main text).

A. Weighted Analysis (Study 1 and Replication)
We conducted a supplementary analysis in which words that were rated by coders as more fitting or representative of the concept PEOPLE were weighed more heavily in the analysis.This was done just in Study 1 because the list of words for PEOPLE was generated for the purposes of this study and was relatively small compared to the list of traits and verbs in Studies 2 and 3, respectively.
As described in detail in the Materials and Methods section of the main text, six coders who were unaware of our hypotheses rated each of the words for PEOPLE on their fit with the underlying concept.We standardized these scores, added a constant (so that they are all positive), and then used these as level-2 (i.e., PEOPLE word-level) weights in the same mixed-effects model described previously.For the two category words added after the coding step ("beings" and "group"), for which we did not have ratings of fit with the concept, we used the average rating of all PEOPLE words because weighted analyses do not permit missing weight values.
In the weighted analysis for Study 1, we again found that words for PEOPLE were more similar in their usage to words for MEN (M = 0.16, SD = 0.04) than to words for WOMEN (M = 0.14, SD = 0.04), B = 0.02, SE < 0.01, p < .001,d = 0.49.In the preregistered replication of Study 1 using this weighted analysis, we also again found that the words for PEOPLE were more similar in their usage to words for MEN (M = 0.19, SD = 0.06) than to words for WOMEN (M = 0.15, SD = 0.04), B = 0.04, SE < 0.01, p < .001,d = 0.72.

B. Masculine Generic Analyses (All Studies)
Some of the words for MEN in our list of gender words (Table S2) are also commonly used to generically refer to people of all genders.For instance, it is common when referring to a person in general to use "he" but not "she" (27).These words are known as masculine generics.It was important to rule out the possibility that the results we observed in the present study were merely an artifact of having these masculine generic words in our word lists, which could have artificially inflated the similarity of MEN words and PEOPLE words.
To investigate this alternative explanation, we removed masculine generic words as well as parallel woman-specific ones from our lists (i.e., "he," "hes," "him," "himself," "his," "man," and "man's"; "she," "shes," "her," "herself," "hers," "woman," and "woman's") and re-ran all analyses for Studies 1-3 and their preregistered replications.All results across all studies were robust to removing masculine generic words (for details, see Tables S6 and S7).Thus, the findings reported in the main text are not merely due to the inclusion of masculine generics among the words for MEN in our list of gender words.In addition to specifically considering masculine generic words, it was important to rule out the possibility that the results of the present studies were overly reliant on any particular word.To do so, we conducted so-called "leave one out" analyses.

Study
For these analyses, we focused on the difference in similarity between words for MEN vs. words for WOMEN and words for PEOPLE (Study 1), trait words (Studies 2A and 2B), and verbs (Study 3).(That is, we did not examine interactions with gender stereotypicality from Studies 2A, 2B, and 3.) For example, in Study 1 we re-computed the model described above 104 times, each time leaving out a single word for PEOPLE, a single word for WOMEN, or a single word for MEN.The resulting effect sizes for the difference in similarity between words for MEN vs. words for WOMEN with words for PEOPLE for each of these iterations are presented in Fig. S1.For analogous effect sizes for Studies 2A, 2B, and 3, see Figs.S2, S3, and S4, respectively.Visual inspection of these plots suggests that leaving out certain words sometimes resulted in smaller or larger effect sizes, but the effect sizes were generally quite consistent.

Fig. S1
The Difference Between Gender Words When Each Person Word and Each Gender Word is Omitted in Study 1 (Top) and its Replication (Bottom) Note."Original" refers to the magnitude of the effect size in the original model when all words were included.For readability, only gender words with the most extreme influence on the original effect size in either direction are depicted.

Fig. S2
The Difference Between Gender Words When Each Trait and Each Gender Word is Omitted in Study 2A (Top) and its Replication (Bottom) Note."Original" refers to the magnitude of the effect size in the original model when all words were included.For readability, only gender words with the most extreme influence on the original effect size in either direction are depicted.

Fig. S3
The Difference Between Gender Words When Each Trait and Each Gender Word is Omitted in Study 2B (Top) and its Replication (Bottom) Note."Original" refers to the magnitude of the effect size in the original model when all words were included.For readability, only gender words with the most extreme influence on the original effect size in either direction are depicted.

Fig. S4
The Difference Between Gender Words When Each Verb and Each Gender Word is Omitted in Study 3 (Top) and its Replication (Bottom) Note."Original" refers to the magnitude of the effect size in the original model when all words were included.For readability, only gender words with the most extreme influence on the original effect size in either direction are depicted.

D. Random Permutation Tests (All Studies)
To again ensure that the results were not overly reliant on particular gender words, we also conducted random permutation tests of the key result.Permutations tests are nonparametric, and do not rely on any particular assumptions about the distribution of the data.For these analyses, we again focused on the difference in similarity between words for MEN vs. words for WOMEN and words for PEOPLE (Study 1), trait words (Studies 2A and 2B), and verbs (Study 3).(That is, we did not examine interactions with gender stereotypicality from Studies 2A, 2B, and 3.) Taking Study 1 as an example, the permutation test involved recomputing the multilevel model described above 10,000 times, each time randomly shuffling the gender to which each word was assigned (e.g., "he" was randomly designated as a word for WOMEN or as a word for MEN).This procedure was repeated for Studies 2A, 2B, and 3.
This process created data-driven estimates of the null distributions of effect sizes and facilitated a comparison between the null distributions and the observed effects.If any particular gender word or subset of gender words was responsible for the observed effects, then the effect sizes resulting from some of the permutations would be similar to our observed effect sizes.Instead, we consistently found that our observed effect sizes were noticeably larger than the null distributions of effect sizes.Thus, these random permutation tests provide converging evidence that words for PEOPLE (Study 1), trait words (Studies 2A and 2B), and verbs (Study 3) were all more similar to words for MEN than to words for WOMEN (all p's < .001for both the main studies and replication studies).

E. Frequency Analysis of the Gender Words (All Studies)
We tested potential differences in the frequency of the words for WOMEN and the words for MEN in the training corpus (Common Crawl) used by both fastText (Studies 1-3) and GloVe (replications of Studies 1-3).Although we took care to create lists of words for WOMEN and words for MEN that were parallel in terms of their meaning and syntax, these two sets of gender words may nevertheless differ in terms of frequency.Word embeddings are somewhat sensitive to frequency (57), and thus it was important to consider this possibility.
To measure frequency, we used information from fastText, which provides the frequency rank of each word in the Common Crawl corpus.(GloVe does not supply frequency information, but both of these algorithms use the same corpus, so the frequency ranks should be extremely similar.)The most frequent word in the Common Crawl is ranked as 1, the next most frequent word as 2, and so on.Although this frequency information is encoded as ranks (rather than exact frequencies), this metric is relatively precise because of the massive scale of the corpus (i.e., over 630 billion word tokens).This rank data also has the benefit of being based on the same information that the word embeddings themselves were based on.
To test for potential frequency differences between our two sets of gender words, we computed a Mann-Whitney U test, which is appropriate for rank data, but found no evidence for a difference between the frequency ranks of words for MEN (M = 426,964.20,SD = 1,137,915.00,Median = 19,873.00)and words for WOMEN (M = 460,639.70,SD = 1,109,425.00,Median = 26,369.50),U = 760, p = .416,d = -0.03.

F. Word-Embedding Association Tests (Studies 2 and 3 and Replications)
Prior investigations of gender-stereotypical associations in word embeddings conducted a wordembedding association test (WEAT; 21).This test was designed to be conceptually analogous to a common measure of human stereotypes from the psychology literature: the implicit association test (IAT; 41).Because both the WEAT and the IAT rely on a double difference score (see details below), they obscure the asymmetry in gender-stereotypical associations we predicted and found in the present research.To compare the present data to previous investigations of gender-stereotypical associations in word embeddings, we conducted a WEAT of gender-stereotypical associations with traits and verbs in Studies 2A, 2B, and 3. We expected to conceptually replicate previous work and find evidence for gender-stereotypical associations in word embeddings.
In Studies 2A and 2B, the WEAT involves first calculating the mean similarity of each trait word to each of the words for WOMEN and, separately, each of the words for MEN and then averaging within gender set.Next, a difference score is calculated between the average similarity of each trait word with words for MEN and words for WOMEN.For traits stereotypical of women, higher difference scores would indicate less bias in line with gender-stereotypical associations (i.e., traits stereotypical of women are more similar to words for MEN than to words for WOMEN).For traits stereotypical of men, though, higher difference scores would indicate more bias in line with gender-stereotypical associations (i.e., traits stereotypical of men are more similar to words for MEN than to words for WOMEN).The next step is to sum these difference scores for the traits stereotypical of men and, separately, for the traits stereotypical of women.The final step is then to compute a difference score of these sums.The resulting single number quantifies the extent to which the similarities between trait words and gender words are more in line with gender-stereotypical associations than not.A p value can then be obtained by conducting a two-tailed random permutation test.
Formally in the present case, let X represent our set of traits stereotypical of men and Y represent our set of traits stereotypical of women (called target words by the original authors; 23).Let M and W represent our set of words for MEN and words for WOMEN, respectively (called attribute words by the original authors; 23).Let cos( ⃗ ,  ⃗⃗⃗⃗⃗) represent the cosine of the angle between the word embedding of a given trait word and, in this case, the embedding of a given word for WOMEN.The WEAT test statistic is, Applying this test to our data in Studies 2A and 2B, we found greater relative associations between words for MEN and traits stereotypical of men and words for WOMEN and traits stereotypical of women than the inverse (Table S8).We also applied this test to our data in Study 3 and to the replications of Studies 2A, 2B, and 3 and found similar results.Thus, our data are consistent with previous investigations of gender-stereotypical associations in word embeddings.
For instance, reference 21 found that women are associated with the arts and men are associated with the sciences compared to the inverse set of associations (d = 1.24).Similarly, we found that women were associated with certain female-stereotypical traits and verbs (e.g., "compassionate") and men were associated with certain male-stereotypical traits and verbs (e.g., "brave") more than the inverse (d range: 0.64-0.89).Crucially, our analyses in the main text show that gender-stereotypical associations were driven by associations about women, not men.Because the WEAT relies on two difference scores, it obscures the asymmetry that we predicted and found.

G. Replication of Study 1 in the Biomedical Domain
We conducted another replication of Study 1 using identical lists of words and other procedures to Study 1, with one exception: We used a different set of word embeddings (53).The goal of this replication was to test the generalizability of the critical PEOPLE = MEN finding from Study 1 in the biomedical domain.
Similar to the word embeddings analyzed in the main text, these biomedical word embeddings were extracted with the fastText algorithm with 200 dimensions (6).But rather than being trained on an undifferentiated corpus of 630+ billion words on the internet (i.e., the Common Crawl corpus), the biomedical embeddings were trained on a smaller corpus of biomedical text: specifically, 4+ billion words from abstracts and titles in the PubMed biomedical and life science research archive and 500+ million words in the MIMIC-III Clinical Database of de-identified hospital clinical notes (vital sign measurements, laboratory test results, procedures, medications, etc.; 87).The biomedical domain was of particular interest because biomedical research and clinical practice have direct implications for gender (in)equity in health, and it would thus be particularly troubling to find a PEOPLE = MEN bias in this domain.
We compared words for PEOPLE to words for MEN and to words for WOMEN using the same mixedeffects linear regression described in Study 1. Replicating Study 1, we found that words for PEOPLE were more similar in their use to words for MEN (M = 0.08, SD = 0.06) than to words for WOMEN (M = 0.05, SD = 0.04), B = 0.03, SE < 0.01, p < .001,d = 0.49.The effect size in this biomedical domain (i.e., d = 0.49) is similar to that in Study 1 reported in the main text (i.e., d = 0.47), demonstrating the generalizability of the present finding to this different domain based on a different corpus.

Table S1
List of Words for PEOPLE in Study 1 With Average Fit Ratings

Table S2
List of Words for WOMEN and Words for MEN in Studies 1-3 With Average Fit Ratings

Table S3
List of Trait Words in Study 2A With Gender Stereotypicality Designations Traits from reference 46. a Designated as stereotypical of men.b Designated as stereotypical of women.

Table S4
List of Trait Words in Study 2B With Gender Stereotypicality Designations

Table S5
List of Verbs in Study 3 with Gender-Bias Designations, Valence, and Position Traits from reference 51. a Designated as female-biased.b Designated as male-biased.

Table S7 The
Interactions Between Gender Words and Gender Stereotypicality in Studies 2 and 3 and Replications Without the Masculine Generic Words and in the Original Results

Table S8 WEAT
Statistics in Studies 2 and 3 and Replications